Skip to content

feat: console observability + real batch processing#36

Open
been-there-done-that wants to merge 21 commits into
mainfrom
ui
Open

feat: console observability + real batch processing#36
been-there-done-that wants to merge 21 commits into
mainfrom
ui

Conversation

@been-there-done-that
Copy link
Copy Markdown
Collaborator

Summary

  • Real batch processing: replaces placeholder stub with actual Chromium/LibreOffice engine calls, real ZIP/merge output, uploaded file persistence
  • Console accuracy: engine conv chart, concurrency panel, and sampler cadence all now reflect batch load correctly
  • Error metric split: ticker splits into 5XX (server errors) and 429 (rate limits) — 4xx client errors excluded from both
  • Banner accuracy: startup banner waits for engines before printing, shows lazy / ready / unavailable accurately
  • Sampler reliability: sampler uses is_alive() (atomic) instead of healthy().await (engine mutex) — prevents 5s stall during heavy batch load
  • Batches panel: fixed-height scrollable list, shows only active jobs, header has queued/running/done counts
  • Dev environment: fixes for Docker named volume ownership, bench crate, rust-embed stub

Test plan

  • make dev starts cleanly with both engines showing [OK] ready in banner
  • ./scripts/load_test.sh runs 3 waves; Engine Conversions chart fills, Concurrency panel spikes, Batches panel scrolls and counts correctly
  • Deliberate 4xx requests in load script do not affect 5XX or 429 ticker blocks
  • After batch jobs complete, Batches panel list empties and done count in header increments
  • Sampler continues broadcasting every 5s during heavy batch load (no stall)

🤖 Generated with Claude Code

been-there-done-that and others added 21 commits May 6, 2026 01:34
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove chromium/LO status fields from TickerPayload constructor
- Add placeholder p50_ms/p55_ms (0.0) to TickerPayload constructor
- Add conversions_total/error_rate/bytes_mb/idle_secs placeholders to both EnginePayload constructors
- Add queue_wait_p95_ms/queue_processing placeholders to ConcurrencyPayload
- Add ts_series/chromium_conv_series/libreoffice_conv_series/queue_wait_p95_series placeholders to ThroughputPayload

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rectly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n sampler

Add global_histogram_pct, engine_conv_total, engine_bytes_total helpers.
Sampler now populates all MetricsSample fields with real values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract ts/conv/queue_wait series from history ring buffer
- TickerPayload now uses real p50_ms/p55_ms from history
- EnginePayload reads conv stats, error rate, bytes, idle time from Prometheus
- ConcurrencyPayload reads queue_wait_p95_ms and queue_processing live
- ThroughputPayload carries all new time series
- Add idle_secs() to PdfBackend trait (default 0) + ChromiumBackend impl

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er data

Replaces placeholder vec![] with live batch data: progress_pct, elapsed,
status badges, item counts, output mode. Sorted running-first, capped at 10.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add p50_ms/p55_ms to TickerPayload, conv/error/bytes/idle to EnginePayload,
queue stats to ConcurrencyPayload, full time-series fields to ThroughputPayload,
item counts and output_mode to BatchPayload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tyStrip removed

- Ticker: add P50/P55 KPIs, drop Chromium/LibreOffice status blocks
- RoutesTable: sticky header, scrollable body, flex layout for height-fill
- StackedBarSeries: new SVG component with hover tooltip for dual engine series
- EngineConvChart: Chromium + LibreOffice stacked conv/sec chart
- QueueWaitChart: queue wait p95 bar chart with tone thresholds
- Engines: add conv/err%/bytes/idle sub-stat grid per engine
- Concurrency: add queue wait p95 and processing job count row
- Batches: richer row layout — mode badge, item counts, failed items highlight
- +page.svelte: ThroughputStrip + 2×2 chart grid above routes; ActivityStrip removed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dockerfile.dev / docker-compose.yml
- Fix bench/Cargo.toml missing from image; add stub ui/build for rust-embed
- Seed cargo-target and cargo-registry volumes with pdfbro ownership
- Mount ./bench and ./ui/build in both dev services

banner.rs / main.rs
- Add EngineStatus enum (Ready | Lazy | Unavailable | Disabled)
- Await eager engine starts in parallel before printing banner so status is accurate
- Lazy engines show [--] lazy; disabled engines omit the row entirely

batch_worker.rs / batch.rs / batch_state.rs
- Replace placeholder process_single_item stub with real engine calls
  (chromium url/html/markdown/screenshot, libreoffice)
- Persist uploaded files to {storage}/inputs/{batch_id}/ before spawning worker
- Build real ZIP output with zip crate; real merged PDF with engine::merge
- Record Prometheus conversion metrics per item (engine conv chart now fills)
- Use global semaphore (state.sem) instead of per-batch semaphore so
  concurrency_active spikes correctly during batch load

backend.rs
- Add is_alive() sync method to PdfBackend trait backed by atomic is_running()
  so the console sampler never blocks on the engine mutex during heavy load

console_store.rs
- Replace single error_pct with server_error_pct (5xx only) and
  rate_limit_pct (429 only); 4xx client errors excluded from both
- Add prev_rate_limit_total for independent delta tracking
- Sampler uses is_alive() instead of healthy().await — eliminates sampler
  stall when Chromium is busy rendering batch items

ui/src/lib/types.ts + Ticker.svelte
- TickerPayload: error_pct → server_error_pct + rate_limit_pct
- Ticker: ERRORS block → 5XX block + 429 block with independent thresholds

Batches.svelte
- Header shows "N queued · N running · N done" summary counts
- Scrollable list (max-height 240px) shows only active jobs
- Completed jobs drop off the list; done count increments in header

scripts/load_test.sh
- New load test script: 5 parallel batches + 10 concurrent URL renders per wave
  plus deliberate 4xx calls to verify they don't pollute error panels

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Memory: re-read cgroup memory.current each sampler tick instead of
  using the stale startup snapshot (fixes 637MB Docker vs 0.18GB UI gap)
- CPU: prefer cgroup v2 cpu.stat delta for container-accurate %;
  fall back to sysinfo only when cgroup v2 is unavailable
- Concurrency: clamp pill pct to 100%, use BUSY/WARN/OK labels,
  remove misleading ERR when active > max (it is load, not failure)
- Batch worker: record queue wait time into pdfbro_queue_wait_seconds
  histogram and track queue_processing gauge per item

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…in header

- Remove EngineConvChart and QueueWaitChart (queue wait p95 already shown
  as a number in the Concurrency panel)
- New CpuChart.svelte and MemChart.svelte replace them in Row 2 of the
  left column, each as a standalone bar-series card
- Add live conv/sec badge to each engine's header in Engines.svelte,
  fed from the last value of throughput chromium/libreoffice conv series
- Remove Resources card from right rail; Batches accepts flex:1 style
  to fill the freed vertical space

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Install bun in Dockerfile.dev dev-base stage
- On container start: bun install + initial vite build (synchronous,
  so the server always starts with the latest UI)
- vite build --watch runs in background — UI auto-rebuilds on every
  save to ui/src without restarting the Rust server
- Mount full ./ui into the container instead of just ./ui/build;
  ui-node-modules named volume keeps Linux binaries off the macOS host
- Add build:watch script to ui/package.json

Batches panel:
- Remove fixed 240px scroll cap; panel now grows to fill available space
- Empty state: clipboard SVG + "Queue is empty" message with hint
- Routes grid gets align-items:start so columns don't stretch to equal height

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename engine stat 'conv' → 'total conv' (lifetime count, not rate)
- Rename header badge 'c/s' → 'conv/s' (rate, unambiguous)
- Add 'time before slot acquired' sub-label under wait p95 so the
  metric's meaning is self-evident without documentation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d POST

Key each route on "METHOD route" to preserve the method extracted from
the pdfbro_http_requests_total and pdfbro_http_request_duration_seconds
Prometheus labels. /health, /version, /favicon.ico etc. now show GET.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g restarts

Without -w flags cargo-watch watches all of /app, so every file vite
writes to ui/build/ during build:watch triggers a Rust recompile.
Scope it to crates/, Cargo.toml, and Cargo.lock only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Layout:
- Outer page div is now a flex column so the main grid can flex:1 into
  the remaining viewport height
- Grid uses align-items:stretch so both columns grow to the same height
- Routes (left) and Batches (right) both fill to the bottom

CPU %:
- When no cgroup CPU limit: show delta_usec/5s * 100 = % of 1 core,
  same formula as docker stats (was dividing by num_host_cpus → ~1%)
- When a CPU quota is set: normalise by limit_cores → % of container quota
- Update CPU chart sub-label to '% of 1 core · cgroup cpu.stat'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds FAST=1 build arg to Dockerfile.test that runs `cargo test --lib`
(unit tests only, no BDD/integration, no Chrome/LO required, ~60s).

Also fixes grep pattern: reverts overly-broad ^error: addition that was
incorrectly catching the known LibreOffice atexit teardown noise as a
test failure. Only compiler errors (^error\[) and libtest verdict lines
are checked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant